Fuzzy Proximity Ranking with Boolean Queries

نویسندگان

  • Michel Beigbeder
  • Annabelle Mercier
چکیده

Based on the idea that the closer the query terms are in a document, the more relevant this document is, we experiment an IR method based on a fuzzy proximity degree of the query term occurences in a document to compute its relevance to the query. Our model is able to deal with Boolean queries, but contrary to the traditional extensions of the basic Boolean IR model, it does not explicitly use a proximity operator. The fuzzy term proximity is controlled with an influence function. Given a query term and a document, the influence function associates to each position in the text a value dependant on the distance of the nearest occurence of this query term. To model proximity, this function is decreasing with distance. Different forms of function can be used: triangular, gaussian etc. For practical reasons only functions with finite support were used. The support of the function is limited by a constant called k. The fuzzy term proximity functions are associated to every leaves of the query tree. Then fuzzy proximities are computed for every nodes with a post-order tree traversal. Given the fuzzy proximities of the sons of a node, its fuzzy proximity is computed, like in the fuzzy IR models, with a mimimum (resp. maximum) combination for conjunctives (resp. disjunctives) nodes. Finally, a fuzzy query proximity value is obtained for each position in this document at the root of the query tree. The score of this document is the integration of the function obtained at the tree root. For the experiments, we modified Lucy (version 0.5.2) to implement our IR model. Three query sets are used for our eight runs. One set is manually built with the title words and some description words. Each of these words is OR’ed with its derivatives like plurals for instance. Then the OR nodes obtained are AND’ed at the tree root. The two automatic query sets are built with an AND of automatically extracted terms from either the title field or the description field. These three query sets are submitted to our system with two values of k: 50 and 200. As our method is aimed at high precision, it sometimes give less than one thousand answers. In such cases, the documents retrieved by the BM-25 method implemented in Lucy was concatenated after our result list.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

XML Fuzzy Ranking

This paper proposes a method of ranking XML documents with respect to an Information Retrieval query by means of fuzzy logic. The proposed method allows imprecise queries to be evaluated against an XML document collection and it provides a model of ranking XML documents. In addition the proposed method enables sophisticated ranking of documents by employing proximity measures and the concept of...

متن کامل

ENSM-SE at CLEF 2005: Uses of Fuzzy Proximity Matching Function

Based on the idea that the closer the query terms in a document are, the more relevant this document is, we propose a information retrieval method based on a fuzzy proximity degree of term occurences to compute document relevance to a query. Our model is able to deal with Boolean queries, but contrary to the traditional extensions of the basic Boolean information retrieval model, it does not ex...

متن کامل

Fuzzy Term Proximity With Boolean Queries at 2006 TREC Terabyte Task

We report here the results of fuzzy term proximity method applied to Terabyte Task. Fuzzy proxmity main feature is based on the idea that the closer the query terms are in a document, the more relevant this document is. With this principle, we have a high precision method so we complete by these obtained with Zettair search engine default method (dirichlet). Our model is able to deal with Boole...

متن کامل

Enabling Data Retrieval : by Ranking and Beyond

The ubiquitous usage of databases for managing structured data, compounded with the expanded reach of the Internet to end users, has brought forward new scenarios of data retrieval. Users often want to express non-traditional fuzzy queries with soft criteria, in contrast to Boolean queries, and to explore what choices are available in databases and how they match the query criteria. Conventiona...

متن کامل

ENSM-SE at CLEF 2006: AdHoc Uses of Fuzzy Proximity Matching Function

Starting from the idea that the closer the query terms in a document are to each other the more relevant the document, we propose an information retrieval method that uses the degree of fuzzy proximity of key terms in a document to compute the relevance of the document to the query. Our model handles Boolean queries but, contrary to the traditional extensions of the basic Boolean information re...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005